Cost-Effective Feature Selection and Ordering for Personalized Energy Estimates
نویسندگان
چکیده
Background Selecting homes with energy-efficient infrastructure is important for renters, because infrastructure influences energy consumption more than in-home behavior. Personalized energy estimates can guide prospective tenants toward energy-efficient homes, but this information is not readily available. Utility estimates are not typically offered to house-hunters, and existing technologies like carbon calculators require users to answer prohibitively many questions that may require considerable research to answer. Aim We want to provide prospective tenants with personalized predictions for energy consumption that are certain (have narrow prediction intervals), accurate (are close to the true values), and cheap (require minimal burden on the user from answering questions). Giving energy estimates for a household requires eliciting information about that household; our goal is to strategically order questions so that the most informative features are answered first, to give confident predictions with minimal user burden. Data The Residential Energy Consumption Survey (RECS) records energy consumption by fuel type and around 500 household features for homes across the U.S.; we can use this dataset to learn relationships between household features and energy consumption to predict energy usage for prospective tenants. We restricted our analysis to the 2470 homes in the same climate zone as Pittsburgh. Methods For the task of providing personalized utility estimates to prospective tenants, we present a cost-based model for feature selection at training time, where all features are available and costs assigned to each feature reflect the difficulty of acquisition. At test time, we have immediate access to some features but others are difficult to acquire (costly). In this limited-information setting, we strategically order questions we ask each user, tailored to previous information provided, to give the most certain predictions while minimizing the cost to users. Results During the critical first 10 questions that our approach selects, prediction accuracy improves proportionally to fixed order approaches, but prediction certainty is higher. We can then make useful predictions at only 30% of the cost of the full-feature model. Conclusion Dynamically ordering questions at test time allows prospective tenants to receive certain and accurate estimates for energy consumption in potential future homes, without requiring them to collect prohibitively costly amounts of detailed information about each unit.
منابع مشابه
Bridging the semantic gap for software effort estimation by hierarchical feature selection techniques
Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...
متن کاملEnsemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملA Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...
متن کاملPersonalized medicine approach in dental management, abilities and challenges: A Review Article
Background and Aims: Advances of genetic science in genomic techniques have led introducing new diagnostic systems to study the diseases or treatment efficacies. In this science, which named “Personalized Medicine”, human genetic structure is used for evaluation of the diagnosis, treatment, and prevention of disease. Regarding the limited number of studies regarding this issue in Oral & Maxillo...
متن کاملFeature selection using genetic algorithm for classification of schizophrenia using fMRI data
In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...
متن کامل